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PREFACE 

In  this  manuscript,  we  develop  the  theory  of  smoothing  sphne  growth  curves. 
Our  primary  application  is  the  estimation  of  temperature  and  density  profile  shape 
variation  in  fusion  plasmas.  We  parameterize  the  temperature  profile  of  the  j-th 
discharge  as 

\n{T,{r,q))  =  C,  +  Mr)  +  Mr)\nq 

where  fo{r)  and  f\{r)  are  unknown  functions  which  we  estimate  using  smoothing 
splines. 

Our  present  treatment  improves  on  the  regression  spline  treatment  of  Ref.  [22] 
and  Ref.  [25]  by  including  a  smoothness  penalty  function  and  thereby  removing  the 
sensitive  dependence  on  the  knot  locations. 

The  original  version  of  this  manuscript  was  prepared  on  February,  1991. 
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Abstract 

The  interactive  spline  model  of  Waliba  is  sj^ecialised  to  growlli  cuives.  Tlie  smooth- 
ing spline  formulation  permits  a  nonparametric  representation  of  the  growth  curves.  In 
the  limit  when  the  distance  between  knots  is  small  relative  to  the  characteristic  scale  of 
variation  of  the  unknown  functions,  the  resulting  growth  curve  estimates  depend  only 
weakly  on  the  number  and  locations  of  the  knots.  The  smoothness  parameter  is  de- 
termined from  the  data  by  minimizing  the  expected  error  or  maximizing  the  goodness 
of  fit. 

KEYWORDS:  Growth  Curves,  Smoothing  Splines,  Generalized  Cross-validation,  Ran- 
dom Coefficient  Models,  Multivariate  Analysis,  Bayesian  Regression. 


I.  Introduction 

Growth  curve  analysis  is  used  to  parameterize  a  family  of  temijoral  curves  whose  shapes 
depend  on  a  vector  of  covariates  u  (Potthoff  and  Roy  (19C4),  Grizzle  and  Allen  (19G9), 
Geisser  (1980)).  As  an  example,  we  consider  the  heights  of  children  as  a  function  of  time 
or  age  and  of  a  number  of  covariates,  both  discrete  covariates  such  as  sex,  and  continuous 
covariates  such  as  drug  dosages.  We  are  primarily  interested  in  the  ceise  where  each  in- 
dividual has  been  measured  a  large  number  of  times,  so  that  the  growth  as  a  function  of 
time  may  be  treated  nonparametrically.  We  assume  that  the  number  of  individuals  is  small 
enough,  or  the  covariate  dependencies  are  simple  enough  that  the  covariate  dependencies 
may  be  treated  parametrically  for  each  fixed  time. 

The  general  theory  of  growth  curves  allows  arbitrary  basis  functions  and  Bayesian  pri- 
ors (Lee  and  Geisser  (1972),  Rao  (1975),  Strenio  et  al.  (19S3)).  In  practice,  however,  the 
basis  functions  are  usually  eissumed  to  be  polynomials  in  time.  Similarly,  the  covariance 
matrix  of  the  Bayesiein  prior  is  usually  a  diagonal  matrix,  or  is  determined  empirically 
from  repeated  measurements,  or  is  given  a  low  order  autoregrossive  structure.  Smoothing 
splines  is  a  powerful  technique  used  to  reconstruct  nonparametric  curves  and  surfaces  (see 
Silverman  (1985),  Eubank  (1988),  MuUer  (1988),  Wahba  (1990)  for  reviews).  The  smooth- 
ness penalty  function  corresponds  to  a  Bayesian  prior.  Recently,  Wahba  has  developed 
a  general  theory  of  interactive  smoothing  spHne  models  (Ch.  10,  Wahba  (1990)),  and  in 
this  brief  communication,  we  adapt  interactive  spline  models  to  Bayesian  growth  curve 
analysis.  Our  nonparametric  growth  curv'e  models  may  also  be  viewed  as  a  variant  of  mul- 
tivariate adaptive  regression  models  (Freedman  (1991)),  which  is  appropriate  when  only 
one  of  the  covariates  is  "timelike"  and  therefore  needs  to  lie  treated  nonparametrically. 

Smoothing  techniques  are  widely  used  for  the  nonparametric  determination  of  a  single 
growth  curve,  including  the  case  of  repeated  measurements  (Largo  (1978),  Gasser  et  al. 
(1984)).  For  families  of  covariate  growth  cmves,  however,  there  has  been  surprisingly  little 
nonparametric  work.  Hardle  and  Marron  (1990)  consider  families  of  growth  curves  which 
differ  from  each  other  by  a  horizontal  shift  and  a  scale  transformation.  Partial  linear  mod- 
els (Rice  (1986),  Heckman  (198C),  Speckman  (1988))  consider  growth  curves  where  the 
parametric  part  of  the  model  depends  on  covariates,  but  the  nonparametric  part  depends 


only  on  time:  fx{t,^,,i)  =  (^,  •  /^  +  /(O-  ^^'^  (19S9)  considers  families  of  grouped  growth 
curves,  and  uses  smoothing  splines  to  determine  both  the  individual  growth  curves  and 
the  main  effect  growth  curves.  Due  to  the  repeated  measurement  structure  of  his  model, 
he  is  able  to  determine  each  nonparametric  function  separately.  A  multidimensional  non- 
parametric  ANOVA  decomposition  is  given  in  Gu  and  Wahba  (1991).  Rice  and  Silverman 
(1991)  treat  a  collection  of  individual  growth  curves,  f,{t),  which  are  distributed  with 
mean,  foit),  and  variance  S(^,^').  They  approximate  E(t,/')  by  its  first  few  principal 
components:  'E{t,t')  ~  ^(-^  geii)ge{t'),  and  then  estimate  both  the  mean  and  the  largest 

Of  the  two  main  smoothing  techniques,  kernel  smoothing  and  smoothing  splines,  we 
concentrate  on  smoothing  splines  because  they  generalize  more  natvually  to  growth  curves 
with  covariates.  Despite  its  generality,  we  are  unaware  of  any  growth  curve  analysis  where 
the  nonparametric  part  of  the  model  depends  on  covariates  through  the  standard  sum  of 
products  structure.  Similarly,  we  are  vmawareof  any  growth  curve  analysis  with  the  sum  of 
products  covariate  structure  where  the  basis  representations  are  regression  splines  except 
our  own  (Kardaun  et  al.  (19SS),  McCarthy  et  al.  (1991)). 

We  briefly  review  smoothing  splines  for  a  single  growth  ciuve  with  n  independent 
measurements:  y,  =  fo(t)  +  e,  where  e,  ~  A''(0,cr^).  We  represent  the  curve  as  a  sum  of 
piecewise  cubic  polynomial  basis  functions:  fo{t)  =  '}2k=\  ctkBi^it).  The  free  parameters, 
a/t,  are  determined  by  minimizing  the  functional 

a     —  a^  Jo 

where  7  is  a  positive  integer  with  ")  >  2.  When  the  smoothing  parameter.  A^,,  is  zero  or 
too  small,  the  spline  coefficients  become  ill-conditioned  and  spurious  oscillations  develop 
with  a  wavelength  proportional  to  twice  the  the  distance  l^ietween  knots  (Wegman  and 
Wright  (1983)).  Two  remedies  for  this  ill-conditioning  are  to  decrease  the  uumlier  of  Ijasis 
functions  and  to  increase  the  smoothness  parameter  A^.  Adjusting  the  spline  representation 
is  usually  a  very  complicated  optimization.  Therefore  researchers  have  concentrated  on 
finding  methods  which  determine  the  optimal  value  of  A^,  from  the  data.  The  smoothness 
penalty  function  then  shrinks  the  a.  prion  probaliility  of  oscillatory  behavior  to  a  small 
and  acceptable  level. 


For  a  single  function,  the  minimization  of  Eq.  1  can  lie  clone  o\er  an  infinite  dimensional 
Sobolev  space,  and  the  resulting  mininnnn  is  a  ciil)ic  spline  fimction  with  knots  at  all  the 
measurement  locations,  hi  practice,  man}-  fewer  knots  may  be  sufficient  to  represent  the 
unknown  function,  fo{t)-  This  class  of  representations  is  called  hyl)rid  spline  models,  hi  the 
covariate  growth  curve  models  of  Sec.  II,  the  measurement  times  are  usually  nonuniform, 
and  we  normally  consider  hybrid  spline  models  with  significantly  fewer  knots  than  the  total 
number  of  distinct  measurement  times. 

To  specify  the  spline  representation,  the  number  and  location  of  the  spline  knots  as 
well  as  their  order  of  continuity  must  l)e  given.  The  rachal  resolution  of  the  fitted  function 
is  bounded  by  the  time  interval  between  measurements,  and  scales  with  distance  between 
knots.  In  the  limit  when  the  distance  between  knots  is  small  relative  to  the  characteristic 
scale  of  variation  of  the  unknown  function,  the  fitted  function  depends  only  weakly  on  the 
locations  of  the  knots. 

II.  Seniiparametric  Growth  Curve  Models 

For  covariate  growth  curves,  we  consider  a  family  of  A'  individuals  with  the  7th  in- 
dividual having  mea^jured  responses  at  ?(,  timepoints,  <i  ,, . .  .  <„_  ,.  We  assume  the  covari- 
ates  are  fixed  in  time  for  the  zth  individual  and  denote  their  value  by  the  M  vector,  u,. 
We  denote  the  n,  vector  of  the  ?th  measurement  times,  (<i.,, ...  <«,.,)'  by  t,.  We  denote 
the  n,  vector  of  measured  responses,  ( ?yi ,,.••.  ?y,i,,, )'  by  ij,  anfl  the  vector  of  true  values, 
(/i(/i.,,  u,,  i),  ...//(/„_.,,  ?7,,  ?))'  l)y  /7,(/,,(7,).  Although  we  are  i)riniarily  interested  in  data 
which  is  grouped  by  individual,  niultii)le  time  measurements  of  each  individual  are  not 
necessary.  Uncorrelated  measurements  may  be  treated  with  ??,  =  1. 

We  divide  the  model  for  i.i{t,u.  i)  into  a  parametric  part,  /;( f.  (7,  ?),  and  a  nonparametric 
part,  /(<,  u,  i).  The  parametric  part  of  the  model,  h(t,  t7,  0,  ?),  is  assumed  to  have  a  specific 
known  functional  form:  h{t.u.l3,i)  —  J2''j-i  hj(t.u,i)3j,  where  the  hj(t.u,i)  are  known 
functions  ajid  the  /5j  are  undetermined  free  parameters.  We  allow  both  the  parametric  and 
nonparametric  terms  to  depend  on  the  specific  individual  to  allow  for  fixed  efi"ects  models. 

We  therefore  consider  growth  curve  models  of  the  form 

L 

^l{t.u.^)  =  M/.7.J,o  +  Mi)  +  E/HO.f/H'7,0-  (2) 


We  assume  that  the  temporal  structure  of  the  f({t)  is  sufficiently  complex  that  it  re- 
quires a  nonparametric  representation.  We  assume  that  a  low  order  parametric  repre- 
sentation is  sufficient  to  treat  the  covariate  dependencies,  (j({u,i).  When  the  last  term, 
IZfci  fe{^)9({^^^)i  is  omitted,  our  models  reduce  to  the  standard  partial  linear  spline  mod- 
els. Our  sum  of  products  covariate  spline  models  resemble  generalized  additive  models, 
and  we  expect  many  of  the  same  techniques  and  theorems  to  apply  (Friedman  and  Stuetzle 
(1981),  Hastie  and  Tibshirani  (1990)).  Our  covariate  spline  models  are  a  slight  generaliza- 
tion of  a  special  case  of  Wahba's  interactive  spline  model. 

In  most  applications,  the  covariate  basis  functions.  (jf(if],  are  linear,  (u,„  —  u„J,  or 
quadratic,  and  are  centered  about  the  database  mean  or  are  discrete  variables.  If  the 
nonparametric  part  of  each  individual  curve  is  considered  noise,  a  nonparametric  function 
may  be  included  for  each  separate  individual:  (jf(u,i)  =  6(,,. 

In  our  study  of  profile  variation  (McCarthy  et  al.  (1991)),  the  data  consisted  of  N 
sets  of  temperature  measurements  at  15  different  radial  locations  in  a  fusion  plasma,  with 
10  <  A''  <  100.  The  radial  location,  r,  is  the  timelike  variable,  and  the  main  covariate  is 
q,  the  ratio  of  the  total  electric  current  to  the  average  magnetic  field.  The  shape  of  the 
profiles  is  much  less  variable  than  the  overall  magnitude,  therefore  each  individual  curve 
was  given  its  own  own  intercept:  hj{t,  ii,  i)  =  6j^,.  Thus  the  log  linear  nonparametric  model 
of  (McCarthy  et  al.  (1991))  is 

/r7[T(r,5,0]  =  a  +  foir)  +  Mr)ln{q)  . 

We  assume  that  both  the  number  and  choice  of  both  the  parametric  basis  functions, 
hj{t,u,i),  and  the  covariate  basis  functions,  gf{u,i)  are  given  a  prion.  In  practice,  both 
sets  of  basis  functions  are  often  determined  iteratively  using  a  combination  of  statistical 
common  sense  and  the  minimizations  of  weighted  siuns  of  residual  errors  and  smooth- 
ness/degree of  freedom/predictive  error  penalty  fimctions. 

Each  of  the  temporal  curves,  fe(t),  is  given  a  radial  representation:  fe{t)  =  J2k=\  <^  k(Bk(t), 
where  the  5fc(<)  are  basis  functions.    We  recommend  the  choice  of  B  splines  for  the  5/, 
(deBoor  (1978)).     B  splines  are  a  particular  reparameterization  of  the  standard  piece- 
wise  polynomial  splines  which  have  the  advantage  that  each  function  is  spatially  localized. 
Thus  the  resulting  design  matrix  will  have  a  band  structure.    In  practice,  the  innermost 


and  outermost  basis  functions  are  often  restricted  to  l)e  linear  in  time. 

When  the  temporal  design  is  uniform,  /,  =  tj,  the  knot  positions  may  be  chosen  at 
all  the  distinct  measurement  times.  If  each  individual  is  measmed  at  slightly  different 
times,  <p_,  =  tp  +  tp_,,  we  can  choose  the  knot  locations  at  the  typical  measurement  times, 
tp.  Alternatively,  we  can  place  a  knot  at  every  distinct  measurement  time:  <p,,.  The 
value  of  the  additional  knots  in  reducing  the  model  error  depends  on  the  ratio  of  the 
typical  time  between  measurements,  fp+i  —  tp,  to  the  the  spread  of  the  measurment  times, 
atp  where  a^^  =  jj  XI,_]  i^,.  When  the  spread  of  times  is  small  relative  to  the  typical 
time  between  measurements,  the  additional  knots  reduce  the  model  error  primarily  in  the 
regions,  pp  — cr,p,?p  +  crtp],  and  the  reduction  in  the  model  error  in  the  complementary  region 
will  be  low.  In  this  ceise,  we  typically  choose  the  first  alternative,  knot  locations  only  at 
tp.  The  number  and  location  of  the  knot  positions  can  be  determined  Ijy  convergence  tests 
or  by  the  data-based  parameter  selection  criteria  of  Sec.  III. 

We  denote  the  A'  x  (£  -|-  1 )  matrix  whose  elements  are  a  ty,  i  =  0. . .  .  L.  k  —  \, .  .  .  K  by 
a  ,  and  the  ^th  column  vector  l)y  at:  ar^  =  a  i;f.  The  n,  x  7v'  temporal  design  matrix  for  the 
nonpaxametric  part  of  the  zth  individual  is  denoted  l)y  X,  with  elements  A'^  ;^.  =  Bi,(tp,,). 
Similarly,  the  parametric  part  of  the  design  matrix  is  defined  by  Hpy  =  hj<(tp,,,  u,,  i).  Thus 
the  zth  individual  is  parameterized  by 

P,(t,,u,]  =  H,/3  +  X,a.7(u,,  7)  (3), 

where  g{u,,i)  =  (1,  <7i(u,,  z), .  . .  (7/,(?7,,  ?))'  ,  and  a  is  the  K  x  (L  +  I)  matrix  of  unknown 
parameters.  Our  construction  of  the  nonparametric  design  matrix  sjiecifically  assmnes 
that  the  covariates,  j7,,  are  time  independent.  If  the  covariates  are  time  dependent,  the 
Kronecker  product  design  structure.  (g{u,.i)'  M  X,).  should  be  replaced  by  X 'c,-  ,,y/^+^-  = 
Bk{tp,)g({u,{tp,),i\  and  a  is  replaced  by  og-  the  concatenation  of  the  columns  of  o- . 

The  error  structure  is  assumed  to  be  independent  between  individuals,  but  arbitrary 
within  each  individual.  The  n,  x  11,  covariance  matrix.  E,,  of  the  measurement  errors 
may  be  either  given  or  estimated.  Thus  our  models  include  random  effects  models  for  the 
nonparametric  part. 

To  determine  the  free  paraineter  matrix  in  our  smooth  spline  growth  curve  model,  we 


minimize  the  functional 

where  7  is  an  integer  with  7  >  2.  The  second  term  consists  of  a  separate  smoothness 
penalty  function  for  each  of  the  f(^  (t).  The  sm.ooi.hing  parameters,  A^,  control  the  trade- 
off between  the  goodness-of-fit,  Ilf(yi  -  /7,(<,,  u,,  cv , /i))'S  7'(y,  -  il,{t,,u,,a  ,0))  and  the 
smoothness  of  the  solution.  If  necessary,  the  utility  functional  may  be  robustified  in  the 
standard  ways.  Our  analysis  generalizes  to  smoothing  surfaces  with  covariates.  In  other 
words,  if  time,  t,  is  replaced  by  several  nonparametric  variables  such  as  time  and  age,  our 
formulas  remain  valid  if  time  is  replaced  by  t  and  /,,.''  (t)  is  replaced  by  M/c(0,  where  M 
is  a  second  order  elliptic  operator. 

Each  penalty  function  may  be  represented  parametrically  as  Jo  fe  {t)^dt  —  cJ^S  a^  , 
where  S  ^^^t'  =  /J  Bl''{t)Bl!'{t)dt  .  Differentiating  Eq.  4  with  respect  to  Of  yields 

'£ge{u,,i)(X\^:'X,ag{i;,.r)-X[i::'{y,-H,ii))+XS&f    =    0  {5a). 

The  corresponding  variation  of  Eq.  4  with  respect  to  1:1,  the  parametric  part  of  the  model, 
yields: 

f;H;E-'H,^=f;H;Er'(y. -X,6fl(J.,0)  (56). 

1=1  1=1 

This  is  a  linear  system  of  {L  +  1)A'  +  .7  unknowns.  The  optimal  values  of  the  (I  +  1) 
smoothing  parameters,  A^,  are  unknown  and  also  may  lie  determined  from  the  data. 

Alternatively,  if  one  or  more  of  the  nonparametric  functions  are  known  to  be  monotone, 
it  may  be  possible  to  eliminate  the  f  th  penalty  function  term,  A^S  07,  in  Eq.  5a  and 
estimate  fe{t)  using  monotone  splines.  (See  Ramsay  (19SS)  for  a  review  of  univariate 
monotone  spline  theory. ) 

For  the  case  of  partial  linear  spline  models,  Speckman  (19SS)  proposed  an  alterna- 
tive estimator,  and  for  a  special  class  of  design  structures,  proved  that  it  resulted  in  a 
lower  order  bias  for  the  parametric  terms.  Sj^eckman  begins  by  estimating  the  para- 
metric terms  from  the  nonsmooth  part  of  the  model,  and  then  estimates  the  parametric 
part  of  the  model  given  his  estimate  of  /i.    For  our  situation,  the  analogous  procedure 


would  be  to  estimate  a  smoothed  estimate  of  each  iiuhvidual  scijarateiy:  y^,  =  K  ,(\,)y, 
where  K,(A,)  =  X,  fXJS,"'^'  +  'VSJ  X,'E~'.  \\'e  then  dctiue  the  uousmooth  (luau- 
tities,  y,  =  i/,  —  K,(A,)!/,  and  H,  =  H,  —  K,(A,)H,.  We  then  estimate  ,i  Ijy  solving 
5Z,_j  H,'S~  H  ,/i  =  '}2',=\  H|S~  j7r  ^^*'  hope  to  examine  this  and  other  generalizations  of 
Speckman's  estimator  under  a  variety  of  different  design  structures  in  fvitvue  work. 

We  now  reformulate  Eq.  5  as  <i  generalized  ridge  regression  with  K(  L  +  l)  +  J  \nd<nowns 
and  Nt  measurements,  where  .Vr  =  Jl,=]  "'•  ^^'•'  concatenate  the  cohnnns  of  the  ?;  infli- 
vidual  measurements  into  a  Nj  vector.  Vc.  and  the  columns  of  o  and  /i  into  a  K{L  +  1 )  +  J 
vector  Oc  :  o^  =  (aj,,  ft2...cri,, /?')■  The  Xj  x  A'(I  +  1)  +  ./  design  matrix  consists  of  the 
concatenation  of  the  N'  matrices  (ff(v,.rY  (>0  X,,  H,),  or  for  time  de]iendent  covariates, 
(Xg,.,  H.). 

The  total  covariance  matrix.  El^-.  consists  of  diagonal  matrix  entries.  S,.  The  to- 
tal penalty  function,  Sf.(A).  consists  of  diagonal  matrix  entries  XfS  :  S  ( A  )cK+/,/'K+i'  = 
^e^  k,k'ie,e'  for  0  <  (  <  L  and  zero  otherwise.  In  this  forundation.  Ecj.  5  is  transformed  to 

(X^E7'X,  +  S,(A)),^,(A)  =  X'.E:'f  (G). 

Our  interactive  spline  models  are  a  special  class  of  mixed  models.  As  such,  the  covariance 
structure  may  l)e  parametrized  with  a  free  jiarameter  vector.  9:  S  ^  =  ^  c(9).  and  6 
may  be  estimated  using  a  restricted  maximum  likelihood  estimator  (Harville  ( 1977)).  The 
Bayesian  posterior  covariance  of  the  discrinized  syst(nn  is 

Cor(n,o;_)  =  ('(X;.E7'X,)"'  +S,(A) 
where  Sc(A)~  is  the  Moore-Penrose  generalized  inverse  of  S(-(A). 

III.  Data-based  Parameter  Determination 

The  optimal  values  of  the  smoothing  parameters.  A.  are  unknown,  and  we  endeavor  to 
estimate  them  from  the  data.  Most  data-liased  estimates  determine  the  free  j)arameters 
by  minimizing  a  functional  of  the  data.  Risk-based  methods  minnnize  a  functional  related 
to  the  predictive  error,  appropriately  weighted  (Hall  and  Titterington  (19S7)).  Goodness 
of  fit  methods  minimize  the  residual  sqviared  error  weighted  !)y  a  measure  of  the  number 
of  effective  degrees  of  freedom. 


Recent  research  indicates  that  expected  loss/rislv  criteria  will  converge  more  quickly  to 
the  optimal  value  of  the  smoothing  parameters  (Hall  et  al.  (1991),  Gasser  et  al.  (1991)). 
These  results  show  that  the  relative  rate  of  convergence  of  goodness  of  fit  estimators  is 
very  slowly,  0(n~'''*°),  to  the  optimal  value  of  the  smoothing  parameter  bandwidth.  In 
contrast,  if  the  smoothing  parameter  is  determined  by  minimizing  an  estimate  of  the 
expected  loss,  relative  convergence  rates  up  to  0(7?"''')  have  been  achieved.  The  reader 
should  be  cautioned  that  the  selection  of  desirable  fmictionals  for  data  driven  parameter 
selection  is  an  active  research  topic,  and  that  the  initial  research  has  concentrated  on 
univariate  kernel  smoothers. 

Most  of  these  risk-based  functional^  rely  on  the  "plug  in"  ai:)proximation;  namely, 
the  desired  functional  involves  unknown  parameters  such  as  o*,.  and  the  variance.  These 
methods  replace  the  unknown  parameters  with  estimated  values,  a-p. 

We  now  present  a  class  of  risk-leased  functionals,  and  then  several  common  goodness 
of  fit  functionals.  These  functionals  do  not  require  the  growth  curve  structure  which  we 
discussed  previously.  Instead,  we  require  only  that  the  covariance  structure,  E  e,  is  known. 
We  begin  by  defining  the  matrix,  R(A),  and  the  influence  matrix,  Ac-(A): 

R(A)=  (X^E7'X,  +  S,(A))''    ,  A,(A)  =  X,(x;,E7'X,  +  S,(A))"'x:.E7'.      (8) 

The  influence  matrix  is  not  a  projection  due  to  the  penalty  matrix,  S(.(A):  -4c(A)  > 
Ac{\)Ac{\),  and  therefore  the  concept  of  eff'ective  degrees  of  freedom  is  tenuous.  In  assesing 
the  espected  error,  we  neglect  model  error  including  discretizaion  error.  The  total  expected 
error  in  c?c(A)  from  the  sampling  perspective  is 

Exp\Cac(\)  -  a,){S,{\)  -  a,Y\    =    R(A)X  ^E  ;'X  ,R(A)   +   R(  A)S  e(A)a,a^S,(A)R(  A). 

(9) 
R(A)X^Ej'XcR(A)  is  the  variance,  and  R(A)S ,( A)o,o;,S ,( A)R(  A)  is  the  bias  error. 
For  any  positive  semidefinite  matrix,  M  ,  we  can  se-lect  A  by  minimizing 
Trace{M  Exp\{'ac{\)  -  cVc)('^.-(  A)  -n,)'|)  = 


Tract{M  [R(  A)X  ^E  7'X  ,R(  A)   +   R(A)S  ,(A)o,a^S  ,( A)R(  A)J )  ,  (lOo) 

with  respect  to  A.  When  M  =  X  'E  ~'X  ,  the  risk  estimate  corresponds  to  minimizing  the 
predictive  error. 


Since  the  true  value  of  0"^-  is  unknown,  we  can  sul)stitute  nj.(A)  into  tlie  bias  error.  Tlie 
direct  plug-in  of  Qc(-^)  into  Eq.  9  is  sul)oi)timal.  Instead,  we  select  the  plug-in  estimate, 
ap(Ap),  to  minimize  the  error  in  the  bias  term  of  Eq.  10a.  Thus  we  recommend  the  plug-in 
estimator,  ap(Ap),  which  minimizes 

Trace(S,(A)R(A)MR(A)S,(A)£:.rp[(^,(Ap)-a,)(^,(Ap)-o,)'])    = 

rrace(Se(A)R(A)MR(A)S,(A)  [R(Ap)X  ^E;'X,R(Ap)   +  R(Ap)S  ,(Ap)Q,o^S  ,( A,)R(  Ap)] 

(106) 
Note  that  both  A  and  Ap  appear  as  parameters  in  Eq.  10b.  Thus  the  plug-in  estimatator, 
Qp(Ap),  depend  indirectly  on  the  value  of  A  in  the  risk  functional  of  Eq.  (10a).  Equation 
10b  also  contains  the  unknown  a^.  which  needs  to  he  estimated  with  another,  possibly 
different  value  of  A.  Ideally,  the  estimator  of  n,-  in  Ecj.  lOlj.  o^,,(Xj,,.),  should  be  chosen  to 
minimize 


Trace(S, (Ap)R(Ap)S , (A)R(A)M  R(A)S,(A)R(A,,)S,(Ap)£.rp  [(Opp(Ap,)  -  o,)(opp(App)  -  aj 

(lOr) 
where  Exp  (cVpp(App)  —  Q<.)(app(App)  —  cv^)'  )  is  given  by  Eq.  9  evaluated  at  App.  However 
the  hierarchy  of  equations  is  infinite,  extremely  cumbersome,  and  illconditioned.  Therefore 
we  make  a  low  order  closure,  and  set  n^  equal  to  cv(Ap)  in  Etj.  lOb  or  set  Qc  equal  to  Q(App) 
in  Eq.  10c.  This  chain  of  phig-in  estimates  for  o  is  the  analog  of  the  need  to  estimate 
higher  order  derivatives  in  the  bandwidth  selection  of  ktu-nel  smoothers  (Hall  et  al.  ( 1991 )). 
When  the  covariance  structure,  S^.,  is  known  up  to  an  arbitrary  scalar:  Ec  =  cr^S  ,  a^ 
can  be  estimated  from  the  data  using 


'o 


.2,-^     II  (5r  -  A,(A)};)'E-'(};  -  A.(A)}; 


a^(A)=  '"^  "    '     '  '    L "    '     '".  (11) 

[.Vx-tr(A,(A))] 

where  A  =  X/a'^.  The  influence  matrix.  A^,  depends  on  A  and  rr^  only  through  the 
combination  A/cr^.  and  we  have  defined  A  (.(A)  =  A(A.cr^).  Risk-ba.sed  estimates  of  A.  a^ 
and  a  may  then  be  constructed  by  solving  Eqs.  6.10  (L'  11  iteratively. 

Most  of  the  goodness  of  fit  data-ljased  functionals  (when  a^  is  unknown)  are  of  the 
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form: 

,,,  r^  _    II  i^c  -kAlWcYt  -\\\-  -  A,(A)K)  II 

V    ( A  j  =  -^         =;  ,  (  i  J) 

NtG{X,{\)) 
where  G  is  a  real  valued  function  on  Nj  x  Nj  matrices.  Fvuthermore,  these  functional 
usually  satisfy  G(A,(A))  ~  \.Q-'2Trace\k  ,(\))INt  when  AV  >>  Trnce{k  ^W)  (Hardle, 
Hall,  and  Marron  (1988)).  Some  data-based  functionals  have  the  problem  that  the  absolute 
minimum  often  occurs  at  A  =  0  (no  smoothing).  Generalized  cross-validation  (G.C.V.) 
(Craven  and  Wahba  (1979))  is  probably  the  most  widespread  data-based  functional  of  the 
form  given  by  Eq.  11,  with  G{A.c[\))  =  |l-0  —  Trace{A  c( A))/AV|'.  Another  goodness  of 
fit  functional  is  the  corrected  Akaike  information  criteria  (Hurvich  and  Tsai  (1989)),  which 
is  based  on  the  information  content.  Eqs.  8-12  are  generalizations  of  previously  known 
functionals  to  an  arbitrary  covariance  matrix,  E,-.  When  S  is  known  risk-based  estimates 
of  A  were  proposed  in  Craven  and  Wahba  (1979). 

Alternatively,  we  can  estimate  A  using  ordinary  cross-validation  (Hardle  and  Marron 
(1988)).  Due  to  the  unbalanced  natvue  of  regression  data.  1/A'  corrections  are  necessary 
for  the  efficient  jacknife  estimation  (Wu  (1986)),  and  similar  corrections  may  be  desirable 
for  cross-validated  smoothing  splines.  Unfortunately,  for  small  sample  sizes,  both  G.C.V. 
and  ordinary  cross-validation  often  have  their  absolute  minima  at  no  smoothing  and  result 
in  undersmoothed  estimates  (Hastie  and  Tibshirani  (1990)). 

IV.  Discussion 

Our  growth  curve  models  are  elements  in  the  comi:)osite  Hill)ert  space,  which  is  the 
direct  sum  of  the  finite  dimensional  subspace  of  parametric  effects  and  (i  -I-  1)  infinite 
dimensional  function  spaces.  Thus  we  anticipate  that  the  convergence  results  of  Stone 
(1985)  and  Chen  (1991)  will  be  extendal^le  with  only  minor  modifications,  such  as  non- 
degeneracy  conditions  on  the  g(u,,i.)  and  hj(t,u,,i).  When  the  number  of  nonparametric 
functions,  (L  +  l)  is  fixed,  the  convergence  rate  of  the  covariate  growth  curve  models  should 
be  equal  to  the  convergence  rate  of  one  dimensional  smoothing  splines  (Cox  (1984)).  In 
particular,  the  convergence  rate  of  the  smoothing  sjiline  aiijiroximation  to  a  -,  times  contiu- 
ously  differentible  function  should  l)e  (^.'^  ??,)"'*/'*''■''".  the  optimal  rate  for  one  dimensional 
approximations. 
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We  caution  that  the  higher  order  convergence  rates  usually  require  large  nunil:)ers  of 
datapoints;  many  more  datapoints  tlian  many  growth  curve  studies  have.  The  higher 
order  convergence  rates  also  require  special,  usually  imrealistic,  boundary  conditions  on 
the  functions.  In  the  absence  of  these  boundary  conditions,  the  bias  error  in  the  si)line  fit 
will  increase  dra.stically  at  the  boundary  and  dominate  the  total  error  (Rice  and  Rosenblatt 
(1983)). 

We  are  currently  implementing  a  numerical  code  for  smoothing  si)line  growth  curve 
analysis  with  a  variety  of  different  data-based  functionals  (hnre  and  Riedel  (1992)).  The 
minimization  with  respect  to  the  imknown  parameters.  A,  is  performed  using  a  quasi- 
Newton  steepest  descent  algorithm. 

The  number  and  choice  of  the  ba.sis  functions,  }ij(t,u,i),  and  (je{u),  may  also  be  de- 
termined by  minimizing  the  data-based  fvmctional.  However,  as  the  dimension  of  the 
multivariate  minimization  increa.ses.  these  data-based  functionals  may  have  a  number  of 
relative  minima.  Furthermore,  the  mininuim  of  the  function  is  often  shallow,  and  the  es- 
timated value  of  A  may  converge  slowly  to  its  ojitimal  value  as  N  tends  to  infinity.  The 
convergence  to  the  optimal  value  is  slow  when  the  utility  function,  V'(A),  is  an  insensitive 
function  of  the  smoothing.  On  the  other  hand,  in  these  cases,  the  risk/goodness  of  fit  is 
not  dramatically  worsened  by  the  use  of  a  suboptimal  value  of  A. 

In  practice,  the  variance  in  the  estimated  mean  profile,  fuit).  is  usually  much  smaller 
than  the  variance  in  the  estimates  of  /](/).. .  fr(f)-  Thus  if  all  A^  are  equal,  the  smoothing 
tends  to  be  too  little  for  /o(/)  or  too  much  for  /i(0  . .  .  ff{t)-  To  reduce  the  dimensionality 
of  the  minimization,  we  recommend  that  the  model  restriction:  A]  =  A2  . .  .  ^r  and  A^  7^  A], 

We  conclude  by  noting  that  when  the  temporal  design  is  vuiiform,  i.e.  X  1  =  X.  2  ■  ■  ■  = 
X,v,  Eq.  5  ma}'  be  recast  in  multivariate  form.  For  simplicity,  we  assume  (3  =  0  for  the 
rest  of  this  paragraph.  The  N  observed  [Mofiles,  each  consisting  of  the  same  n  time  points, 
can  be  represented  by  a  ;?  x  .V  data  matrix,  Y  .  The  (I  -f  1  )  x  A'  covariate  data  matrix, 
U  ,  has  colun:ins  g{ui)  through  g(u^)-   In  nuiltivariate  notation,  the  growth  curve  model  is 

Y  =XoU  +E,  (13) 

where  E   is  the  n  x  N  matrix  of  random  errors.    The  first  variation  of  the  the  utility 
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functional  (Eq.  5a)  can  be  rewritten  as 

(X'S-'XcvUU' +SaA)  =X'E-'YU\  (14) 

where  Aisa(L  +  l)x(L  +  l)  diagonal  matrix  whose  elements  are  \e.  Unfortunately,  the 
tensor  product  formulation  does  not  allow  a  solution  based  on  the  separation  of  variables. 
Instead,  the  full  {L  +  1)A"  x  (Z  +  1)A"  system  must  l)e  solved.  A  number  of  authors 
have  proposed  numerically  efficient  approximations  to  Tract  {A  r(A))  when  the  model  is  a 
single  smooth  curve  (Silverman  (19S4)),  and  similar  simplifications  might  be  possible  with 
covariates  provided  that  the  temporal  design  is  imiform. 

A  separable  smoothing  spline  model  may  be  constructed  as  follows.  We  decompose 
UU'  into  ODO'  where  O  is  orthonormal  and  D  is  diagononal.  We  then  replace  the 
growth  curve  model  of  Eqs.  2  it  3  with  the  equivalent  model  with  a"""  =  ft°'''0  and 
g({r,,  i)"^"'  =  0'g{u,,i)°''^ .  This  is  equivalent  to  replacing  the  old  penalty  function  for  the 
functions  fe{t)  with  a  martrix  valued  penalty  function.  The  separable  system  splits  the 
growth  curve  problem  into  L  +  1  independent  one  dimensional  problems.  For  the  separable 
problem,  the  one  dimensional  convergence  residts  may  be  extended  trivially. 

In  conclusion,  we  have  applied  the  interactive  smoothing  spline  methodology  to  growth 
curve  analysis.  Growth  curves  characteristically  have  much  temporal  structure  and  res- 
olution and  relatively  poorly  resolved  covariate  structure.  Thus  we  apply  nonparametric 
smoothing  splines  in  the  temporal  representation  and  simple,  parametric  representations 
in  the  covariate  directions.  This  class  of  models  does  not  require  nudtii>le  time  measure- 
ments, but  the  class  is  well  suited  to  this  structure.  In  other  situations,  there  may  be  only 
enough  data  in  the  temporal  direction  for  parametric  representations  or  sufficient  data  in 
the  covariate  directions  for  a  nonparametric  representation. 
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